Random Forest Ensemble Visualization
نویسنده
چکیده
The Random forest model for machine learning has become a very popular data mining algorithm due to its high predictive accuracy as well as simiplicity in execution. The downside is that the model is difficult to interpret. The model consists of a collection of classification trees. Our proposed visualization aggregates the collection of trees based on the number of feature appearances at node positions. This derived attribute provides a means of analyzing feature interactions. By using traditional methods such as variable importance, it is not possible to determine feature interactions. In addition, we propose a method of quantifying the ensemble of trees based on correlation of class predictions.
منابع مشابه
Application of ensemble learning techniques to model the atmospheric concentration of SO2
In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...
متن کاملImproving Random Forests
Random forests are one of the most successful ensemble methods which exhibits performance on the level of boosting and support vector machines. The method is fast, robust to noise, does not overfit and offers possibilities for explanation and visualization of its output. We investigate some possibilities to increase strength or decrease correlation of individual trees in the forest. Using sever...
متن کاملPerformance Assessment and Interpretation of Random Forests by Three-dimensional Visualizations
Ensemble learning techniques and in particular Random Forests have been one of the most successful machine learning approaches of the last decade. Despite their success, there exist barely suitable visualizations of Random Forests, which allow a fast and accurate understanding of how well they perform a certain task and what leads to this performance. This paper proposes an exemplar-driven visu...
متن کاملADABOOST ENSEMBLE ALGORITHMS FOR BREAST CANCER CLASSIFICATION
With an advance in technologies, different tumor features have been collected for Breast Cancer (BC) diagnosis, processing of dealing with large data set suffers some challenges which include high storage capacity and time require for accessing and processing. The objective of this paper is to classify BC based on the extracted tumor features. To extract useful information and diagnose the tumo...
متن کاملForest Garrote
Abstract: Variable selection for high-dimensional linear models has received a lot of attention lately, mostly in the context of !1-regularization. Part of the attraction is the variable selection effect: parsimonious models are obtained, which are very suitable for interpretation. In terms of predictive power, however, these regularized linear models are often slightly inferior to machine lear...
متن کامل